Skip to content

CUDA: update build CTK version to 12.8 #13360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

thevishalagarwal
Copy link
Contributor

@thevishalagarwal thevishalagarwal commented May 7, 2025

update CUDA Toolkit version from 12.4 to 12.8 to support compilation of real arch sm120 Blackwell GPUs.

  • updated ggml-cuda/CMakeLists.txt to add compilation of sm120 arch for Blackwell GPUs
  • updated CUDAToolkit version from 12.4 to 12.8 for windows github CI build

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels May 7, 2025
@thevishalagarwal thevishalagarwal force-pushed the github-workflow/update-cuda-12.8 branch from e1db936 to c54c98f Compare May 12, 2025 11:53
@thevishalagarwal
Copy link
Contributor Author

@JohannesGaessler @slaren @ggerganov ping for review

@slaren
Copy link
Member

slaren commented May 14, 2025

I am not sure that we need to add real arch 120 to the build. The criteria for selecting the real archs to include is based on what we expect to be the most commonly used GPUs to improve the load time in these cases, but at this point there are likely very few people with RTX 50 series GPUs, below 1% according to the steam hw survey.

@thad0ctor
Copy link

I am not sure that we need to add real arch 120 to the build. The criteria for selecting the real archs to include is based on what we expect to be the most commonly used GPUs to improve the load time in these cases, but at this point there are likely very few people with RTX 50 series GPUs, below 1% according to the steam hw survey.

why not? The percantage of people using blackwell for ai, particularly 5090s and RTX 6000 are probably disproportionate to the overall steam hw survey

thad0ctor added a commit to thad0ctor/llama.cpp that referenced this pull request Jul 2, 2025
- Add comprehensive 22-week implementation roadmap for Blackwell (compute capability 12.0)
- Include detailed technical specifications with code examples
- Focus on Flash Attention optimizations using Thread Block Clusters
- Plan leverages enhanced L2 cache (126 MB) and HBM3/HBM3e memory
- Build foundation already complete via PR ggml-org#13360 (CUDA 12.8 + sm120)
- Target 20-40% Flash Attention improvement over Ada Lovelace

Phase 1: Foundation and architecture detection (accelerated - complete)
Phase 2: Thread Block Clusters implementation
Phase 3: Flash Attention Blackwell optimizations
Phase 4-7: Advanced features, validation, and integration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants